-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BYOC] JSON Runtime with DNNL End-to-End Flow #5919
Conversation
maybe we should enable dnnl on CI? |
yeah, we should. And we should remove the json_runtime_example. |
7178c5e
to
9f76449
Compare
Possibly out of scope for this PR but is there a plan to support multiple functions/sub-graphs? Currently it looks like there is only support for a single dnnl sub-graph after the graph is partitioned? |
We now have only one subgraph per module, but we could have many modules to support multiple subgraphs. Please see @mbaret 's comments to this PR and the discussions for details. |
Apologies, missed that, thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but please wait on @lhutton1's approval to confirm this is usable with ACL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's working well so far, thanks! I think the api to add additional attributes and retrieve them from a json node is a bit convoluted but I this could always be improved at a later date.
Do we want to wait until dnnl is up on the CI? And what about @zhiics's comment below.
|
@masahi Thanks, my comment should be resolved with a followup PR. |
We could wait for CI. It should have been updated and included DNNL library already (#5936 ). |
Wait for #5985. |
c0f7d43
to
b012904
Compare
* json runtime * json dnnl WIP * fix ArrayNode usages * Support composite functions * DNNL json runtime: conv2d/add/relu/dense/bn * add a more complex example * fix bias memory issue * rebase to upstream * merge to metadata module, remove the unused driver * handle constant * support composite functions * support DNNL constant * clean up * Simplify dnnl user code * GetDataSize * fix dense bug * improve cmake * zero copy * add unit test * move json to contrib/json * fix cmake * lint * max_digits10 for fp serialization * only keep base getfunction * fix lint * zero copy for all data entries * address comments * enable ci * address comment; fix bug * address comment Co-authored-by: Zhi Chen <chzhi@amazon.com>
* json runtime * json dnnl WIP * fix ArrayNode usages * Support composite functions * DNNL json runtime: conv2d/add/relu/dense/bn * add a more complex example * fix bias memory issue * rebase to upstream * merge to metadata module, remove the unused driver * handle constant * support composite functions * support DNNL constant * clean up * Simplify dnnl user code * GetDataSize * fix dense bug * improve cmake * zero copy * add unit test * move json to contrib/json * fix cmake * lint * max_digits10 for fp serialization * only keep base getfunction * fix lint * zero copy for all data entries * address comments * enable ci * address comment; fix bug * address comment Co-authored-by: Zhi Chen <chzhi@amazon.com>
RFC discussion: https://discuss.tvm.ai/t/byoc-runtime-json-runtime-for-byoc/6579
Currently, BYOC allows developers to choose either C source module or their customized module as the runtime for their accelerators. While we have provided an end-to-end execution flow of DNNL (i.e., MKL-DNN, OneDNN) using C source module, we found that many developers prefer to use a customized module to better integrate to their own runtime engine, such as TensorRT. As a result, this PR (collaborating with @zhiics) provides an end-to-end flow of DNNL using JSON runtime. Some detail highlights:
We provide JSON codegen and JSON runtime base classes. JSON codegen serializes a Relay subgaph to a JSON file; while JSON runtime base provides deserialization methods to interprete subgraphs in JSON format. Developers can derive JSON codegen to easily customize their codegen, or even directly use JSON codegen if their runtime engine accepts standard TVM graph runtime JSON.
We make a case study of leveraging JSON runtime with DNNL. The DNNL JSON runtime now supports conv2d, dense, relu, batch_norm, and add. As a result, it is able to run MobileNet. Note that DNNL JSON runtime only creates one DNNL execution engine for a subgraph, so it is much more efficient compared to the C source module version, which creates a DNNL engine for each operator in a subgraph.
DNNL JSON runtime handles constant tensors following the new mechanism in [RUNTIME] Introduce MetadataModule to separate code compilation/interpretation and weight initialization #5770.
DNNL codegen with C source module will be preserved for illustraction purpose, and we use cmake to control which DNNL codegen should be used. Specifically,
USE_DNNL_CODEGEN ON
andUSE_DNNL_CODEGEN JSON
enable the JSON runtime (and this is the default runtime for DNNL). When following the tutorial, which we will update after this PR, users may useUSE_DNNL_CODEGEN C_SRC
to enable C source module so that they can learn how it work.Evaluation
This PR doesn't push the inference performance with the DNNL codegen/runtime. While we leave the issues as the future work, here we list some performance issues we have observed.
write_to_dnnl_memory
performs memory copy from DLTensor (NDArray) to DNNL memory. This can be avoided by specifying NDArray pointer when creating a DNNL memory. (~5 ms overhead to MobileNet V2).We should set
OMP_NUM_THREADS
wisely. For example, MobileNet V2 withbatch_norm
simplified achieves 1400 ms on c5.2xlarge. However, if we run it withOMP_NUM_THREADS=16
, then the latency dropped to 65 ms.OMP_NUM_THREADS=16
: 65 ms.OMP_NUM_THREADS=16
: 16 ms.In summary, if we resolve all issue mentioned above, the inference performance of MoibleNet V2 on c5.2xlarge should be 16 - 5 =11 ms.
cc @masahi @mbaret @tqchen